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Abstract —Massive MIMO communication systems, by virtue 
of utilizing very large number of antennas, have a potential to 
yield higher spectral and energy efficiency in comparison with 
the conventional MIMO systems. In this paper, we consider 
uplink channel estimation in massive MIMO-OFDM systems 
with frequency selective channels. With increased number 
of antennas, the channel estimation problem becomes very 
challenging as exceptionally large number of channel parameters 
have to be estimated. We propose an efficient distributed linear 
minimum mean square error (LMMSE) algorithm that can 
achieve near optimal channel estimates at very low complexity 
by exploiting the strong spatial correlations and symmetry of 
large antenna array elements. The proposed method involves 
solving a (fixed) reduced dimensional LMMSE problem at each 
antenna followed by a repetitive sharing of information through 
collaboration among neighboring antenna elements. To further 
enhance the channel estimates and/or reduce the number 
of reserved pilot tones, we propose a data-aided estimation 
technique that relies on finding a set of most reliable data 
carriers. We also analyse the effect of pilot contamination on 
the mean square error (MSE) performance of different channel 
estimation techniques. Unlike the conventional approaches, 
we use stochastic geometry to obtain analytical expression 
for interference variance (or power) across OFDM frequency 
tones and use it to derive the MSE expressions for different 
algorithms under both noise and pilot contaminated regimes. 
Simulation results validate our analysis and the near optimal 
MSE performance of proposed estimation algorithms. 

Index Terms: Channel estimation, massive MIMO, stochastic 
geometry, OFDM, LMMSE. 

I. Introduction 

In wireless communications, the demand for higher data 
rates has been dramatically increasing mostly owing to the un¬ 
precedented usage of data-hungry devices e.g., smart-phones, 
super-phones, tablets etc., for wireless multimedia applications 
ID. Over the years, the MIMO technology (that exploits 
multiple antennas at the transmitter and/or receiver) has played 
a pivotal role in sustaining the increased data rates. Installing 
multiple antennas offers key advantages such as multiplexing 
gain and diversity gain due to increased spatial reuse m, 
0. The MIMO technology has already been incorporated 
into many wireless products and standards such as WiFi 
IEEE802.1 In 0 , WiMAX IEEE 802.16e 0 , LTE (4G) 0 . 

Recently, it was established that the use of very large 
antenna arrays, typically of the order of few hundreds, at 
the base station (BS) can potentially provide huge gains in 
system throughput, energy efficiency, security and robustness 
of wireless communication systems ( 7 ). Such systems, known 


as massive MIMO or large scale MIMO systems m-ma, 
overcome many limitations of traditional MIMO systems. 
Massive MIMO increases system capacity by simultaneously 
serving tens of users using the same time-frequency resources. 
Moreover, the large number of low power active antennas 
allows to focus energy in a small spatial region by forming 
a sharp beam towards desired users. This additionally implies 
that there will be little intra-cell interference 0- Because of 
these vital advantages, massive MIMO has attracted a lot of 
research interest and is envisioned as an enabling technology 
for next generation (5G) wireless communications DU- 

Hand in hand with the advantages are entirely new research 
challenges that need to be tackled for massive MIMO. The 
bottleneck in achieving the full advantages of massive MIMO 
is the accurate estimation of the channel impulse response 
(CIR) for each transmit-receive antenna pair. Having a very 
large number of antennas means that a significant number of 
channel coefficients need to be estimated — far more than that 
could be handled by traditional pilot-based MIMO channel 
estimation techniques (see lfl2l and references therein). In 
this regard, Bayesian minimum mean square error (MMSE) 
estimator provides an optimal estimate in the presence of ad¬ 
ditive white Gaussian noise (AWGN). The method is complex 
and therefore, a number of approaches have been developed 
to reduce its complexity such as those proposed in Ha- 
El. Unlike the least squares (LS) or interpolation based 
techniques El, the MMSE estimation has a clear edge in 
that it can effectively utilize the channel statistics to improve 
the estimation accuracy. However, the direct generalization of 
these techniques to massive MIMO has some drawbacks. In 
particular, they suffer from huge complexity due to matrix 
inversion of very large dimensionality, making it impractical. 
Some methods to reduce the complexity of MMSE estimator 
in massive MIMO have also been proposed e.g., DIS-iMI. 
It is important to note that most of the existing methods 
make assumptions that are not always true. For example, 
many methods deal with flat fading channels only while others 
assume that the channels are sparse. Therefore, low complexity 
channel estimation approaches suited to multi-cell and multi¬ 
carrier massive MIMO systems need further investigations. 

In this paper we propose a distributed algorithm for the 
estimation of correlated Rayleigh fading channels in massive 
MIMO-OFDM systems. The novel distributed LMMSE algo¬ 
rithm significantly reduces the computational complexity while 
attaining near optimal CIR estimates. The distributed approach 
is inspired by our previous work in l25l (where channels are 
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assumed to be sparse and exhibit common support with the 
neighboring antennas). Furthermore, in order to enhance the 
estimation performance, we also propose a data-aided estima¬ 
tion technique that relies on finding a set of most reliable data 
carriers to increase the number of measurements, instead of 
increasing the reserved pilot tones ll26i . Equivalently, by using 
the data-aided technique, the number of reserved pilot tones 
can be reduced to attain a performance that is comparable to 
pilot-based estimation, thus increasing the spectral efficiency. 

In a multi-cell setting, allocation of orthogonal pilot se¬ 
quences for all users cannot be guaranteed due to finite coher¬ 
ence time of the channel and the limited available bandwidth 
0. Therefore, it is inevitable to reuse the pilot sequences 
across the cells. One of the major consequences of pilots reuse 
is that when the BS in a cell is performing channel estimation 
via uplink training, the channel estimates will be severely 
distorted (contaminated) by the pilots of the neighboring cell 
users. The impact of pilot contamination on channel estimation 
is far greater than AWGN. In fact, it was shown in [271 that 
the effect of uncorrelated interference and fast Rayleigh fading 
diminishes as the number of BS antennas increase while the 
effect of pilot contamination is not eliminated. Hence, it is 
important to investigate the effect of pilot contamination on 
MSE performance of different channel estimation techniques. 
Although the effect of pilot contamination on system per¬ 
formance has been analysed by many researches e.g., 128S . 
|29j , only few studies have analysed its impact on channel 
estimation performance l20l . Moreover, in these works, the 
analysis is carried out for fixed locations of (interference) 
users. Also it can be seen from analytical expressions derived 
in these works, that the pathloss, which is determined by user's 
locations, plays an important role in MSE performance eval¬ 
uation. As such, the above works cannot analytically answer 
how the randomness of users’s locations would effect MSE 
performance under pilot contamination. In contrast to existing 
studies, we approach the problem by using concepts from 
stochastic geometry. By assuming that the interfering users are 
distributed according to homogeneous poisson point process 
(PPP), we derive analytical expressions for MSE of LS and 
LMMSE based channel estimation algorithms in the presence 
of both AWGN and pilot contamination. The analytical results 
are validated by simulations. The results clearly show the 
dependence of important massive MIMO network parameters, 
such as pathloss and user's density, on the MSE performance 
and give clue to mitigate the effect of pilot contamination. It 
is shown that the increasing the number of pilots does not 
improve the estimation performance in the presence of pilot 
contamination. Moreover, the dependence of MSE on antenna 
spatial correlations suggests that the massive antenna array 
structure could be optimized to slightly improve the estimation 
performance under pilot contamination. 

The remainder of the paper is organized as follows. Section 
ITU describes the system and spatial channel correlation model. 
In Section [III] we present the MMSE and LS based chan¬ 
nel estimation in the presence of AWGN only and discuss 
their limitations for massive MIMO. The proposed distributed 
LMMSE algorithm is presented in Section [TV] To enhance 
estimation performance, the data-aided approach is consid¬ 


ered in Section [V] Section [VI] describes the effect of pilot 
contamination on channel estimation, and the expression for 
interference correlation is presented. Based on this, the MSE 
expressions for different algorithms are derived under AWGN 
and pilot contamination. Simulation results are presented in 
Section IVIII and finally we conclude in Section IVIIII 

A. Notations 

We use the lower case letters x and lower case boldface 
letters x to represent the scalar and the (column) vector re¬ 
spectively. Matrices are denoted by upper case boldface letters 
X whereas the calligraphic notation X is reserved for vectors 
in the frequency domain. The ith entry of x is represented by 
x(i), the element of X in /th row and yth column is denoted 
by Xij and the vector x/,. represents the fcth column of X. We 
use x('P) to denote a vector formed by selecting the entries 
of x indexed by set V and X('P) to denote a matrix formed 
by selecting the rows of X indexed by V. We also use X, ; to 
refer to the (z, j)th block entry of a block matrix. Further, (.) T , 
(.)* and (.) H represent transpose, conjugate and conjugate 
transpose (Hermitian) operations respectively. We use diag(x) 
to transform a vector x into a diagonal matrix with the entries 
of x spread along the diagonal. (X{k)) denotes the hard 
decoding i.e., maximum likelihood (ML) decision of X{k). 
E{.} represents the statistical expectation. The discrete Fourier 
transform (DFT) and inverse DFT (IDFT) matrices are repre¬ 
sented by F and F H respectively, where we the (1, fcjth entry of 
F is defined as fi t k = N~ l M e -V^ lk / N , ^ fc=o, 1, 2, • • • , TV— 1 
for an TV-dimensional Fourier transform. Finally, the weighted 
norm of a vector x is given by ||x||^ = x ff Ax. 

II. System Model 

We consider a multi-cell massive MIMO-OFDM wireless 
system as shown in Fig. |T] where the BS in each cell is 
equipped with uniform planar array (UPA) consisting of a 
large number of antennas. Moreover, we assume that each BS 
serves a number of single antenna user terminals. The antennas 
on UPAs are distributed across M rows and G columns with 
horizontal and vertical spacing of d x and d v respectively. We 
define the (m. <y)th antenna as the antenna element in mth 
row and gth column which corresponds to r=rri + M(<j — l)th 
antenna index where 1 < m < M , 1 < g < G and 1 < r < R, 
where R=MG is the total number of antennas in a UPA. Fig. 
[2] shows an example of a MxG UPA structure with antenna 
indexing. Note that, depending on values of G and M, the 
antennas could have linear or a rectangular configuration. We 
however, confine our attention to rectangular UPA structure 
which is a viable configuration in deployment scenarios for 
massive MIMO 0. 

Each user communicates with the BS using OFDM and 
transmits uplink pilots for channel estimation. We assume that 
all users in a particular cell are assigned orthogonal frequency 
tones so that there is no intra-cell interference. However, due 
to necessary reuse of pilots, there are users in the neighboring 
cells that transmit pilots at the same frequency tones, resulting 
in an inter-cell interference or pilot contamination. Since 
only the user in a particular cell of interest will experience 
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Figure 1: Multi-cell massive MIMO system layout. 
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delay profile (PDP). As manifested by (T]), R a rray is assumed 
to be identical across the Z taps while R tap is assumed to be 
identical across the array. For the spatial correlation matrix 
Rarrayi we adopt a ray-based 3D channel model from li30l 
which is more appropriate for rectangular arrays. Accordingly, 
the spatial correlation between array elements r=(m,g) and 
r'=(j>, q) is given by. 


[R„ 


D 1 D 7 + (D 2 (3i n ^^) 2 P 2 P 6 


array i rr i — 

where the ZVs are defined as. 


( 2 ) 


JJ 1 _ e J^ L (p-m)cos(e) e -^(i^^-) 2 (p-m) 2 sin 2 e , 

D 2 = q - g)sin(9 ), 

D 3 = £?^-(q - g)cos(d) , 

Di = \ (^v) (P ~ ™)(q ~ 9 )sin( 2 Q) > 

D 5 = {D 3 ) 2 (sin{(/))a) 2 + 1 , 

Dq = D±{sin{4i)<j) 2 + cos(cf>), 

Di = (D 3 ) 2 cos 2 4> — (Di) 2 {sin{(j))a) 2 — 2D^cos(j) ■ 


Figure 2: An example of M x G UPA structure with antenna 
indexing. 


interference from the users of neighboring cells that share 
pilots at the same frequency tones, hence without loss of 
generality, it suffices to consider one user per cell with all 
users transmitting pilots at same OFDM frequency tones. 

A. Channel Model 

In the discussion that follows, we assume that there is no 
inter-cell interference and thus focus on a single-cell single- 
user scenario (the case of multi-cell will be treated in section 
IvTl further ahead). Further, we assume a multi-path channel 
between user and receive antenna r modeled by a Gaussian 
L-tap CIR vector. Specifically, the channel between user and 
antenna r is defined by h,,= [h r ( 0), h r ( 1), • • • , h r (L — 1)] T 
where h r (l) G C represents the Zth tap complex channel gain. 
We append all the CIR vectors from a user to the R antennas 
of the BS to form an RL dimensional composite channel 
vector h= [h|. hJ. ■ ■ ■ , li^,] . Further, we collect the Zth tap 
of all transmit-receive pairs to form an R dimensional Zth tap 
vector h^= h 2 (l), ■ ■ ■ ,/i#(Z)] T . Then, the RL x RL 

dimensional composite channel correlation matrix can be 
written as, 

Rh — E{hh H } = Harray ® Riap , 0) 

which is the kronecker product (0) of two components: (i) 
The RxR dimensional antenna spatial correlation matrix, 
Rarray=E{h^hW H }, VZ=0,1, • ■ • , L — 1, which represents 
the correlation among the Zth taps across the array and 
(ii) The LxL dimensional channel tap correlation matrix, 
R top =E{h 7 .hJ?}, Vr=l, 2, • • • ,R , which represents the cor¬ 
relation among the CIR taps that depends on channel power 


Here, v is the carrier-frequency wavelength in meters, <f> 
and 9 are the mean horizontal angle-of-departure (AoD) and 
the mean vertical AoD in radians respectively, a and £ are 
the standard deviation of horizontal AoD and the standard 
deviation of vertical AoD respectively. As shown in l30l . the 
spatial correlation matrix can be well approximated as. 


R 


array 


Raz & Re/ 5 


(3) 


where R az and R e ; are the correlation matrices in azimuth 
(horizontal) and elevation (vertical) directions, having dimen¬ 
sions (MxM) and ( GxG ) respectively and are defined as, 


[R el] m ,p e ' 


fR a 


J 2 2 _^(p_ m ) cos (0) l (^ 27^)2 ( p _ m )2 sin 2 g 


1 


y/Dl 


> 1 d 3 cq 3± i (g 2 ^r 

e J e 2 


B. Signal model 

We assume that there are N OFDM sub-carriers and let 
X represent the TV-dimensional information symbol whose 
entries are drawn from a bi-dimensional constellation e.g., 
Q-QAM. The equivalent time-domain symbol is obtained by 
taking inverse Fourier transform i.e., x=F 11 X. The time- 
domain symbol is then transmitted after inserting a cyclic 
prefix (CP) of length at least L— 1 to avoid inter-symbol- 
interference (ISI). After removing the CP at the receiver, 
the frequency-domain OFDM symbol at rth antenna can be 
represented as, 

y r = diag (X)Hr + Wr , (4) 

where, W r is frequency domain AWGN vector of zero mean 
and covariance Rn^cr^Iyv and 9Z r is the channel frequency 
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response between the user and receive antenna r i.e., 

= v^Fh,. . (5) 

Here F is truncated Fourier matrix formed by selecting the 
first L columns of F. Using (0, we can re-write (0} as, 

y r = Vn diag(AT)Fh r + W r = Ah r + W r , (6) 

where A=\/]V"diag(A)F and the noise vector W r is assumed 
to be uncorrelated with the channel vector h r . We assume 
that K sub-carriers are reserved for pilots and the remaining 
N — K for the data transmission. Further, it is best to allocate 
the pilots uniformly as shown in OTl . Hence, for a set of pilot 
indices denoted by vector V , the system equation 0 reduces 
to, 

y r (V) = A(V)h r + Wr(V), (7) 

where y r (V) and W r ('P) are formed by selecting the entries 
of y r and W r indexed by V while A(V) is a K x L matrix 
formed by selecting the rows of A indexed by V. 

We can now collect the pilot measurements (|7} received by 
all antennas into a single system of equations as follows, 


U r = VNF 


y(v) = [ifl® A(p)]h + w(p) 


( 8 ) 


where, y?)=\y?V),- 




W{V) 


I/j represents an R x R identity 
matrix and h, as defined earlier, represents the composite 
channel vector from user to the BS. For convenience, we 
assume the noise variance to be identical across the array so 
that W(V) ~ CAf(0, R. !i ,=cr^I_Rx). Note that the number 
of unknown channel coefficients in ® are RL whereas the 
total number of equations are RK. Therefore, a necessary 
condition to solve 0 for h (and also 0 for h r ) using least 
squares, is that the number of pilots be at least equal to L 
i.e., K > L. However, K could be reduced if we utilize the 
correlation information. With the models defined above, we 
are ready to estimate the CIRs between the user and each BS 
antenna. We pursue different approaches that can be adopted 
for channel estimation in massive MIMO setup depending on 
whether the information processing takes place independently 
at each antenna element or jointly at a centralized processor. 
We start with naive LMMSE and LS based techniques and 
discuss their limitations, and then propose a new distributed 
approach in section [IV] which is further extended in section 
0 with the help of data-aided approach. 


III. LMMSE AND LS BASED CHANNEL ESTIMATION 

In this section, we present three different techniques for 
channel estimation in massive MIMO-OFDM based on the 
well-known LMMSE and LS estimators and discuss their 
limitations. For now, we assume that estimates are corrupted 
only by the white noise. Hence, without loss of generality, we 
consider a single-cell single-user scenario for the approaches 
presented below. 


A. The Localized LMMSE (L-LMMSE) estimation 

In this approach, all CIRs are estimated independently 
based on the observations received at each antenna element 
by using the classical LMMSE estimation. Using the linear 
system model in 0, the LMMSE estimate of h r is obtained 
by minimizing the (local) MSE, E{||h r —h r || 2 }, over h. r as 
follows J32) 

hr = (R-top + A h R“ 1 A) _1 AHR- 1 ^ , (9) 

where we drop the index vector V for convenience. Similarly, 
it follows that the (minimum) MSE is, 

mse r = trace (Rj)jp + A h R,~ 1 A) 1 . (10) 


The overall global MSE is obtained bv taking summation 
over all array elements i.e., MSE^ L ^= X] r =i mse n which after 
simplifying (flOt . can be expressed as, 


mse ( l) = rJ2 


1 + pKS l 


(ID 


where are eigenvalues of R tap , p = E x /a^ is the 

SNR with E x representing the average signal energy per 
symbol and the superscript (L) indicates L-LMMSE. Observe 
from < fm > that channel delay spread L, has an adverse effect 
on MSE performance, which can be reduced by increasing 
the number of pilot tones. The computational complexity of L- 
LMMSE is of the order 0(i?L 3 ) (see TablcQ}, which increases 
linearly with the number of BS antennas. However, the CIR 
estimates are not optimal in the sense of minimizing the overall 
or global MSE. The estimates would have been optimal, had 
the antennas been placed sufficiently apart so that the channel 
vectors were effectively uncorrelated. But for massive MIMO 
with extremely large number of antennas, it is expected that 
antennas are located in close proximity, so the channel vectors 
are highly likely to be correlated with each other. 


B. The Optimal LMMSE (O-LMMSE) Solution 

In this strategy all the channel vectors are estimated simulta¬ 
neously by minimizing the global MSE, E{||h— h|| 2 } over the 
composite channel vector h. This could be realized by sending 
all observations to a central processor and then invoking the 
LMMSE estimation based on the composite system model in 
0. The solution to this problem is given by, 

h = (R h 3 + A h R“ 1 a) a h r ~ l y , (12) 


where, A=I/{ ® A, Rh is as given in CO and for notational 
convenience we dropped the index V. The corresponding MSE 


is, 

MSE(°) = trace (R h 1 + AHR^a) , (13) 


which can be simplified to yield, 


MSE (0) = EE 




j=l i= 1 


1 + pKr/jSi 


(14) 


where, rjj and Si are eigenvalues of R array and Rj ap re¬ 
spectively. By comparing (fTTb with (fTTt . we conclude that 
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in presence of spatial correlation, the optimal solution yields 
better MSE performance than the localized strategy, however, 
it has the following two major drawbacks: 

1) Realization of optimal strategy requires global sharing 
of information to/from the central processor that results 
in communication overhead (as it requires complex 
signalling which can be very expensive). 

2) As evident from (HU), the computation of optimal 
LMMSE requires inverting a non-trivial matrix of very 
high dimension ( RK x RK ) that leads to computational 
complexity of order 0(f? 3 L 3 ), which is cubic in number 
of BS antennas. 

In massive MIMO scenario where R is of the order of few 
hundreds, both of the above mentioned operations are very 
expensive and possibly impractical. 

C. Estimation using Least Square (LS) 

If the channel statistics are unknown, one can employ simple 
LS based estimation. In the absence of correlation, we can 
let the inverse of channel correlation matrix go to zero, i.e., 
Rjjjp —> 0, thereby ignoring the channel statistics. Therefore, 
the localized LS solution from © is, 

h 1 ; = (A H A)~ 1 A H y r , (15) 

and the resulting MSE is given by, 

msejf = trace (A h R“ 1 A) . (16) 

In this case, the overall MSE simplifies to, 

R 

MSE (ls) = y mse]? = ^ . (17) 

7=i pK 

Comparing © with CD, we conclude that LS has poor 
performance in comparison with the LMMSE as it does not 
utilize the channel statistics. It is for this reason that the 
centralized LS (C-LS) solution would achieve the same MSE 
performance as the localized one as shown below. 

MSE (c " LS) =trace ((I fl ® A) H (I* ® R.J” 1 (I* ® A)) 

= trace (i^ ® A h R,“ 1 A) 1 , 

R 

= trace (A H Rjj 1 A) 1 , 

r=1 

= MSE (ls) , 

where we have used the Kronecker product identities, (A ® 
B)(C <g> D)=AC ® BD and (A <g> B) _1 =A~ 1 <g> B' 1 . 

In short, the L-LMMSE estimation has the advantage of 
low complexity (and better performance than LS) but it is 
unable to exploit the strong spatial correlation among antenna 
elements which is inevitable in massive MIMO systems. On 
the other hand, O-LMMSE exploits the spatial correlations but 
at a significantly higher computational cost. This motivates 
us to propose a method that can overcome the shortcomings 
of aforementioned techniques without affecting the estimation 
quality. Specifically, we propose a distributed estimation of 
CIRs based on antenna coordination that attains near optimal 


performance with tractable complexity. The proposed dis¬ 
tributed LMMSE estimation is described below and is further 
extended in section [V] via a data-aided technique. 

IV. The Proposed distributed LMMSE (D-LMMSE) 

ESTIMATION 

It is well known from equivalence results in linear esti¬ 
mation theory l33li that the O-LMMSE solution (fl2l > could 
be alternatively obtained by solving an RL dimensional opti¬ 
mization problem, 

argmin||y - A'h||^-i + ||h||^-i , (18) 

h h 

where all the variables are as defined earlier. Instead of 
solving © globally (as done earlier), we aim to solve it 
in a distributed manner over R antennas in which the ?’th 
antenna has access to y r only. Moreover, the antenna r is 
interested only in determining its own CIR (i.e., h r ) without 
worrying about other h/s. Here, we would like to mention that 
this problem is fundamentally different from those considered 
in the context of adaptive networks lf34l . Also, most of the 
existing distributed estimation techniques in adaptive networks 
deal with single task problems in which all nodes in the 
network estimate a single common parameter of interest. 
Furthermore, they rely on full cooperation between the nodes, 
i.e., exchanging both the estimates and the observations with 
the neighbors. Although, the distributed recursive least squares 
(RLS) algorithm of ll35l might be adopted to solve (fl8l l. it 
would be gravely complex in number of dimensions (due to 
large R in massive MIMO and large channel delay spread) 
and hence might suffer from convergence issues. Our proposed 
solution, the distributed LMMSE (D-LMMSE) algorithm, as 
will become clear, is much simpler in that it exploits the 
structure of spatial correlation matrix R rarray and relies only 
on exchanging the (partial) weighted estimates of CIRs with 
immediate neighbors, thus reducing the communication and 
computational cost significantly. The proposed D-LMMSE 
algorithm is composed of three main steps namely the esti¬ 
mation, sharing and update, as explained below. 

A. Estimation 

In the estimation step, each antenna acting as a center 
antenna rc , estimates not only its own CIR but also the CIRs 
of its neighborhood. The neighborhood of rc consists of 4- 
direct neighbors represented by the set i'r, rjj, ro}Q 

on the left, right, top and bottom positions respectively as 
shown in Fig. |3(a)| Also, let the corresponding channel vectors 
be represented by he, h/, li«, hy and he respectively and 
let h c represent |.A/ + 1 L x 1 dimensional composite channel 
vector of the central antenna and its |Af| direct neighbors (i.e., 
h c = it 3 . h] . h]j, lij . h^] )■ During the estimation process, 
each antenna acting as a central antenna computes the estimate 
of h c by solving a reduced dimensional weighted least squares 

'Note that for elements lying at the edges of a UPA, the number of 
neighbors are different, so that 2 < .V < 4. The set of neighbors including 
the central antenna is represented by A/"+. 
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(WLS) optimization problem, 

h c = argminliycOP) - A(P)ho||^i + \\h c f R -i , (19) 

h c “ h ° 

where yd'P) represents pilot observations at the central 
antenna, R h c is channel correlation matrix defined as R h c = 
E{h c (h c ) H } and R u , =cr^I k is the noise covariance matrix 
at the central antenna. From (HD it is clear that information 
is processed locally at each antenna as each antenna uses 
only its own observations and interacts with its neighborhood 
only through Rh= (it is assumed that the central antenna 
has available correlation information of its neighborhood to 
construct Rh<0- The solution to the above WLS minimization 
problem can be obtained by first re-writing ( IT9b explicitly in 
terms of h c as, 

h c = argmin ||y - Ah c ||^_i + ||h c ||^-i , (20) 

h c w h ° 

where, y=y C (V) and A= [A(V) Okxl\aT\] ■ Then, by in¬ 
voking the equivalence between LMMSE and WLS estimation 
problems we obtain, 

h c = (R h ;! + a h r~ 1 a) _1 A^n^y . (2i) 

We define P c = (C';) _ l as the inverse of error covariance 
matrix at the central element, which is given by the expression, 

P c = R h i + A h R- 1 A . (22) 

Then, by using (El l into (EK the weighted estimate of 
composite channel at each antenna is simply, 

K = pC h c = A H R ffi 1 J . (23) 

This weighting of the estimates asserts that we put more 
confidence into the estimates which are more reliable and vice 
versa. The estimation step is non-recursive and is computed 
once for all antennas in the array. Thus, having found the 
P matrices in (El l and the weighted estimates in ( El l, each 
antenna is ready to initiate sharing. 

B. Sharing 

The sharing step is the key to the proposed distributed 
algorithm where the information is shared through collabo¬ 
ration between antennas. Let us define the sub-vector h w j 
of composite vector h )( , as a (weighted) CIR estimate of 
antenna j (i.e. the vector h,) computed by the antenna k. 
In sharing step, each antenna acting as a central element, 

shares only the partial information with its neighbors such 

k 

that the antenna k, with composite vector h^, would share 
only the selected components; its own (weighted) estimate 
h w k and the (weighted) estimate h w j , j £ J\f, with its j th 
neighbor. Henceforth, the shared vectors will be termed as 
partial vectors and represented by an underlined notation. An 
example of how this sharing takes place is also depicted in Fig. 
13(b)! for a 3 x 4 array with central element rc =1 having only 
two neighbors; J\f={rn=A, ro=2}. As shown, each of the 
neighboring element shares only two sub-vectors (i.e., partial 
information) of its composite vector with the central antenna. 
The collaboration between the rest of the array elements takes 


place in a similar fashion. 

As a result of information sharing, each antenna acting as 

A j 

a central node rc receives |A/”| partial vectors, h w ,j £ J\f, 
from its neighbors, each of dimension \N + \L x 1 and having 
only two non-zero components; h,,., and h„. r: . For the example 
in Fig. [3(b)| the composite vector of the central node and the 
partial vectors received from its neighbors are given as follows. 


hie 1 

A 4 


1 V s 2 

h-n; 1 

1 

^ CN 

_1 

II 

J\ 

1 

i° 

_1 

and h w = 

1 

<N 

0 CH 5 
_ 1 


Note that the estimates which are not shared have been 
assigned as null vectors. 

C. Update 

Having received the (partial) LMMSE estimates from the 
neighboring elements, each antenna acting as the central 
element updates its estimate and error covariance matrix. The 
update rule is based on the optimal combining of estimators, 
a standard result in LMMSE estimation theory. The result is 
summarized in the following lemma. 

Lemma 1. Let yi and y 2 be two separate observations of 
a zero mean random vector h, such that yi=Aih + wi 
and y 2 =A 2 h + W 2 , where we assume that h is uncorrelated 
with both wi and W 2 . Let hi and I 12 denote the LMMSE 
estimates of h and Ci and C 2 be the corresponding error 
covariance matrices in two experiments. Then, the optimal 
LMMSE estimator and the error covariance matrix of h given 
both the observations are, 

C-^ = c^hi + c^h 2 , (25) 

and 

C -1 = Cj" 1 + Cf 1 + R^ 1 — Rj" 1 — R^" 1 , (26) 

where, R^=E{hh H } and Ri and R 2 are covariance matrices 
of h in two experiments. 

Proof: See (33l . ■ 

Aforementioned lemma can be easily extended to more than 
two observations. The lemma suggests an optimal way of 
combining the individual estimates obtained by independent 
observations. We use this lemma at each antenna to improve 
the initial channel estimate by combining it with the estimates 
computed and shared by |A/’| neighbors. Consequently, by 
treating each antenna as a central element rc, the update rule 
is given by following equations, 

(£“> = (.?-'> + £ £!“> . ( 27 ) 

feW 

and 

pc(i) = pc(i— 1 ) + 5 (28) 

where P J and R h , represent the partial (inverse) error 
covariance and correlation matrices associated with the 

- j 

partial estimates h u , and i represents the iteration index. 
Note that in the update equations, we employed the weighted 
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(a) Information diffusion process (b) Information sharing process 


Figure 3: |(a)| During the first iteration rc (blue antenna) receives information from its 4-direct neighbors (pink antennas). In 
the second iteration, the information from next nearest neighbors (green antennas) also comes in and so on. |(b)| An example 
of a 3 x 4 antenna array where the neighboring antennas (indices 4 and 2) share the selected estimates (highlighted) with the 
central antenna (index 1). 


estimates and inverse error covariance matrices to minimize 
the computational requirements. The recursions in the update 
equations are initialized by (l23l) and ( 1221 respectively, which 
are available after the estimation step. In the subsequent 
iterations, each antenna would also require the partial matrices, 
P 7 's and R h 2s, for each of its \Af\ neighbors. Fortunately, 
they can be obtained from P c and Rh respectively (which 
are available at the central antenna) by exploiting the 
symmetrical structure of R array . Thus, there is no need to 
share them across the neighboring elements, that in turn saves 
a significant amount of communication burden. Specifically, 
the matrices R h - and P c exhibit the following two properties □ 

Property 1: The matrix R], is identical for all elements in 
the neighborhood of rc i.e., Rhc=RhfjV) £ A f 
Property 2: The matrix P c is identical for all elements in the 
neighborhood of rc i.e., P°=P \ij £ M 

Property 1 is attributed to the symmetric nature of the spatial 
correlation matrix R array, which implies that the spatial 
correlation between any two antennas, placed equidistant apart, 
is the same. Therefore, it is not difficult to see that property 
1 holds exactly under the Kronecker model and our earlier 
assumption of identical tap correlation across the antenna array 
in section [TT] Property 2 is the consequence of property 1 when 
incorporated into (l22l >. 

Hence, to obtain the patrial correlation matrices, R h ,, j £ 
AT, we use property 1 to first set R h , =Ri 1 c and then mod¬ 
ify the off-diagonal block entries corresponding to the null 
vectors of partial estimates as R y =0 if any h,,,,. li„,j=0 and 
the diagonal block entries as Rj—Ir, if h,,„=(), where the 
subscript ij denotes the (i, j)th block. The matrices P^'s are 
obtained in the similar fashion except that the diagonal block 

2 These properties are generally satisfied as the spatial correlation matrix is 
usually symmetric, if not, then the antennas can share these matrices as well. 


entry corresponding to null vectors is replaced by al where 
0 < a < 1 is a small positive number, which indicates 
very low weight or confidence in null estimates (that are not 
shared). In essence, the central element has the full information 
needed to construct P J 's and R h , ’s corresponding to shared 

~ j 

estimates h^,. We illustrate how these matrices could be 
obtained for the example in Fig. |3(b)| Consider the central 
antenna rc=l, its |A/"|=2 direct neighbors with (shared) partial 
estimates given in (l24l) . The partial correlation and error 
covariance matrices associated with those estimates (shown 
underlined) along with that of central element are given in 
(l29l) and (l30l > respectively. 

Based on above steps and procedures, the proposed D- 
LMMSE algorithm is summarized in Algorithm Q] 



R 11 

Rl4 

Rl2 


R44 

R 41 

0 

R h i = 

R 41 

R 44 

R42 

> E:h 4 — 

Rl4 

R 11 

0 


R 21 

R 24 

R 22 


0 

0 

I L 





R 22 

0 

R 21 



and R h2 = 

0 

I L 

0 





R 12 

0 

R 11 


(29) 


P' = 


Pll P14 

P41 P 44 

P2I P24 


Pl2 

P 42 

P 22 


, P 4 = 


and = 


P44 

P14 

0 

P22 

0 

P12 


P41 0 

Pit 0 

0 al 

0 P21' 

al 0 
0 Pn 


(30) 


Remarks: 

1) The information sharing and update take place during 
each iteration of the algorithm such that after few itera¬ 
tions the information diffuses across the whole antenna 
array. This concept of sharing is depicted in Fig. |3(a)| 













































































































Algorithm 1 Distributive LMMSE (D-LMMSE) algorithm 

1) (Estimation) Each antenna acting as a central element 
rc computes h„ and P c by using (l23l) and (l22l) 
respectively. 

2) (Sharing) Each antenna acting as a central element rc 
shares partial estimates, h t(; with its |,A/| neighbors as 
described in IIV-BI 

3) (Pre-processing) Using Rh< , P c from step 1 and the 

received (partial) information in step 2, each 

antenna, acting as a central element rc, constructs 
{Eh/ }, {E 7 }, 3 G N. 

4) (Update) Each antenna acting as a central element rc, 
updates its weighted estimate and error covariance using 
(ISTjl and (l28l) respectively. 

5) (Iterate) Repeat steps 2-4 D times, where D represents 
maximum number of iterations. 

6) (Output) Compute h =(P c )” 1 h u) and output the esti¬ 
mated CIR he. 


which shows that information diffusion process grows 
exponentially, resulting in fast convergence. 

2) The repetitive sharing enables each antenna in the array 
to utilize the observations from distant elements, thereby 
improving its estimate in each iteration till it converges 
to near optimal solution. 

3) As opposed to the central processing mechanism, the 
proposed sharing step is more convenient and computa¬ 
tionally more efficient as all antennas do not commu¬ 
nicate with each other. The collaboration takes place 
only among the neighboring antennas. Therefore, the 
complexity of proposed algorithm is significantly less 
than the centralized approach. 

4) Note that, the antennas share only the partial information 
because only selected vectors are transmitted to the 
neighbors which save significant amount of communi¬ 
cation. Also, estimation step and the repetitive sharing, 
pre-processing and update steps require simple linear 
block processing and have a fixed size data structure 
which is well suited for real implementations. In con¬ 
trast, the memory and processing requirements for the 
centralized approach are even more challenging with 
large array dimensions. 

D. Complexity Analysis 

In Table Q] we compare the computational complexity of 
proposed D-LMMSE algorithm with LS, L-LMMSE and the 
centralized O-LMMSE algorithm in terms of multiply and 
add operations. The figures indicate that complexity of pro¬ 
posed algorithm is slightly higher (linear with BS antennas) 
than L-LMMSE but is significantly less than the centralized 
approach. For the proposed D-LMMSE algorithm, it is also 
worth mentioning here that, the P matrices in (l22l > can be 
computed off-line and in parallel at all antennas as they 
do not depend on observations. Moreover, the computation 
of weighted estimates in (f23l> does not involve any matrix 
inversion. Further, the update in (l27l) requires simple addition 


Table I: Computational Complexity 


Algorithm 

Multiplications (x) 

Additions (+) 

Complexity 

LS 

RK{L + 1) 

R{KL - 1) 

O(RLK) 

L-LMMSE 

R[2L 3 4 + L 2 + 

K(L + 1)] 

RL[L+K- 1] 

0(RL 3 ) 

O-LMMSE 

R[{L 3 + 1 )R 2 + 
RL(L+I<)+K]+L 3 

R 2 LK 

0(R 3 L 3 ) 

D-LMMSE 

R[(5 3 +1)L 3 +2(5L) 2 
+ L(K+ 1) + 5 3 ] 

R[D(5L) 3 +(5L) 2 
+L(K- l)-D] 

0(R5 3 L 3 ) 


during each step of iteration, while (1281) needs one time 
computations of inversions R^/ as they do not depend on 
iteration index. Finally, the computation of inverse,(P c ) _1 is 
also required only after the convergence when each antenna 
outputs its final estimate. 

E. Choice of D 

The choice of parameter D i.e., maximum number of 
required iterations, has a great influence on computational 
complexity and convergence of the proposed D-LMMSE al¬ 
gorithm. A trivial choice for D is that it can be set to the 
largest dimension of the array i.e., D=max(M, G), which 
will ensure that each antenna receives information from every 
other antenna in the array. The aforementioned choice of D 
would guarantee the convergence of the proposed D-LMMSE 
algorithm but such a high value of D is very inefficient from 
the computational complexity point of view, particularly if the 
array dimensions are large. We therefore, derive a simple loose 
upper bound on maximum number of iterations D that is much 
better than the trivial choice. To this end, we first note that the 
total number of antennas sharing information in D iterations 
of algorithm are 2 1) {I) +1) +1. Hence, in order to ensure that 
each antenna receives information from every other antenna in 
the array, we should have 2 D(D + 1 ) + 1 < R. Solving this 
inequality we get. 



It must be emphasised here, that the actual value of D also 
depends on the spatial correlations among antennas. If the 
antennas are not very strongly correlated, then we might not 
gain from sharing and a small number of iterations might be 
sufficient. In fact, if the antennas are completely uncorrelated, 
then the sharing cannot improve the channel estimates as the 
O-LMMSE solution will converge to the L-LMMSE solution 
(see Section HD. 

F. Convergence of D-LMMSE Algorithm 

The notion of convergence of D-LMMSE algorithm is 
attributed to the fact that every iteration of algorithm succes¬ 
sively brings new information from the neighboring tiers to 
the central antenna. 

For example, consider the antenna array depicted in Fig. 
|3(a)| and focus on the central antenna rc- Let A(o and Rfq 
represent the extended data matrix and channel correlation 
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matrix respectively, during the i-th iteration. Then by defining 
Ii = 1, we can write. 


A(j) = I i+1 ® A and R (i) = R $ ray <S> R tap (32) 


where, R array represents the spatial correlation matrix of the 
central antenna and all its neighbors up to v'-th tier. Further 
assume that, {6i}f =1 and are eigenvalues of R iap 

and R array respectively, arranged in decreasing order of 
magnitude. Then for D= 0 (i.e., no sharing case), the resulting 
MSE at the central element is. 


mse^ = trace ^R( 0 ) + 


A^R„ ,:l A 


(o) 


-r 



(33) 


which is obviously the MSE of L-LMMSE in (ITot . Similarly, 
for D = 1 (i.e., sharing up to the first tier), rc receives 
information from its \Af\ neighbors (e.g., red antennas of the 
1 st tier) and updates its estimate by optimal combining of 
neighboring estimates as described in IIV-CI Therefore, the 
resulting MSE at rc can be written as, 


mSe r C j^+l 


trace (R (1 * 




-r 


IA+1 L 

EE' 


VjSi 


l^+l fri 1 + pKiySi ' 


(34) 


Comparing |3l with (El, we note that mse^^mse^, 
where the equality holds only if r/j =1. V) (i.e., spatially 
uncorrelated channels). Proceeding similarly, it can be shown 
that mse^^mse^T 1 , so that the MSE during each iteration 
decreases monotonically till it converges after utilizing obser¬ 
vations from all antennas in the array. 


V. Data-aided Channel Estimation 

The basic idea of data-aided channel estimation is to exploit 
the data sub-carriers in order to improve the initial channel 
estimates obtained using only the pilots. As the data aided 
technique does not require additional pilots, it is spectrally 
more efficient. Here, the pilot-based channel estimate is used 
for data detection, which along with the reserved pilots can 
significantly enhance the channel estimation. It is possible 
that some of the data-pilots be erroneous due to noise and 
channel estimation errors, while some of the other data-carriers 
are reliable i.e., they are likely to be decoded correctly. An 
important problem is how to down-select a subset of the most 
reliable data-carriers to be used as data-pilots. 


A. Reliable Carriers Selection 

Consider the received OFDM symbol at any antenna as 
shown in E}, and let h and "H be the CIR and CFR estimates 
obtained using pilots. Then, the tentative estimates of the 
data symbols are obtained by equalizing the received OFDM 



Figure 4: Concept of reliable carriers selection. Here, X(2) 
has higher probability of decoding correctly than X(\). 


symbol using zero-forcing (ZF) as follows, 

A(fc) = E^, ke{l,2,---,N}\V 

« X(k) + ¥^l = X(k) + Z(k), (35) 

where, Z(k) represents the distortion on fc-th data-carrier 
due to noise and channel estimation error. Given the CFR 
estimate, Z{k) can be modelled as Gaussian with zero mean 
and variance ='H(fc)~ 2 cr^. The recovery of data symbols is 
then performed by simple hard decisions on estimated symbols 
X(k) denoted by (X(k)). Clearly, the errors in the decoding 
process occur due to noise as well as inaccurate channel esti¬ 
mates. Hence, some data-carriers would be severely effected 
by noise and channel perturbation errors i.e., Z(k) and fall 
outside their correct decision regions, while for some other 
data-carriers the distortion is not strong enough and they are 
decoded correctly. All those data carriers X(k) which satisfy 
the condition {X{k))=X(k) with high probability, are termed 
reliable carriers. 


The proposed strategy for selecting the subset 1Z of the 
most reliable data-carriers, motivated by ||26l . is based on the 
criteria, 

f z (z(k)=X(k) - (X(k))) 

(R(fc)=—g--- L -, (36) 

EEi,^<^)> /• (*(*)=*(*) - A m ) 

where, / 2 (.) is the pdf of Z(k) and A rn represents the set 
of constellation alphabets. Note that the numerator in d36l) is 
the probability that X(k) will be decoded correctly while the 
denominator sums the probabilities of all possible incorrect 
decisions due to distortion Z(k). The subset 'R, is formed by 
selecting only those data-carriers for which 0l(k) > 1 i.e., 

Tl = {k | (R(fc) > 1} . (37) 

The metric (l37l > is intuitively appealing as it selects only those 
sub-carriers which are likely to be decoded correctly with high 
probability. Fig [4] further elaborates this idea; that even though 
X(l) and X{2) have the same distance from X, X(2) is 
more likely to be decoded correctly than A’(l), as it is farther 
from the nearest neighbours and therefore is less likely to be 
decoded as any other constellation point. 
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B. Revisiting the Estimation Step 

We now revisit the estimation step of the proposed Algo¬ 
rithm [I] using both the pilots and reliable carriers in order to 
enhance the initial estimates. Let TV be the set of indices of 
reliable data carriers for antenna r, obtained in reliable carriers 
selection process. Each antenna could revisit the estimation 
step by solving ( IT9t using an extended set of indices, V U TV 
corresponding to pilots and reliable data carriers. To make 
it computationally efficient, we instead proceed by exploiting 
the block form of RLS to update the pilot-based estimates. 
Skipping the derivation, the update equations for data-aided 
estimation at antenna r are given by, 

h r d = h r + CgA^G (y d - A d h r ) , (38) 

C r ed = Cl - C r e A«GA d , (39) 

G = (R„, + ArfCgAf ) _1 , (40) 

where, y d =y r {V U TV) is extended set of observations, 
A d = [A(V UlZ r ) 0\-p U fcr\ x \j^\ L \ is the extended data ma¬ 
trix, G represents the gain matrix and h and C) are respec¬ 
tively the estimate and error covariance matrix obtained using 
only the pilots during the estimation step. The complete data- 
aided approach is described in Algorithmic] 


Algorithm 2 Data-aided Distributive LMMSE (DAD- 
LMMSE) Algorithm 

1) Run step 1 of Algorithm Q] to get h and C£ at each 
antenna index r. 

a r 

2) Each antenna uses its CIR estimate, h to form the 
subset TV of the most reliable data-carriers. 

3) Update the estimates and error covariance in step (1) 
using (l38l)-(l39t. 

4) Run steps (2)-(6) of Algorithm]]] with P r =(Cp) -1 and 
h w =Pr h . 


VI. Effect of Pilot Contamination 

So far, we assumed single-cell scenario where all the users 
have been allocated orthogonal resources for uplink channel 
estimation, thus the pilot observations are corrupted only by 
AWGN. In a multi-cell scenario, predominated by pilot con¬ 
tamination due to aggressive reuse of the pilots, the knowledge 
of the interference statistics is critical in studying the effect of 
pilot contamination on channel estimation techniques. Unlike 
the existing pilot contamination analyses, we take a stochastic 
geometry based approach to derive analytical expressions for 
interference correlation. 

A. Modified Network Model 

To characterise the inter-cell interference resulting from 
pilot contamination, we modify our previous 2-D network 
model of Fig. |T| by introducing interferes that are assumed 
to be distributed according to a PPP. Due to its simplicity 
and tractability, the PPP has been widely used in stochastic 
geometry for modelling of the interference in cellular networks 



Figure 5: Realization of interferes distributed according to PPP 
of A=0.3, 7 0 =2m and 7 m =5m with BS at the origin. 

(see j36l and references therein). Specifically, without loss 
of generality, we assume a single user in a reference cell of 
radius 7 0 , communicating with the BS located at the origin 
O in a 2-D plane. The interfering users (outside radius 7 0 ) 
are distributed over a circular region of radius 7 m according 
to a homogenous PP, denoted by T and having intensity A. 
Thus, the interfering space is an annular region with radii 
7o and and where the distance of ith interferer from 

BS satisfies 7 0 < 7 i < 7 m . Fig. 0 shows a realization 
of interferes distributed according to homogeneous PP of 
A=0.3 with 7 0 =2m and 7 m =5m. Further, from Il37l - ll39l . we 
conclude that the interference itself is not correlated across 
OFDM frequency tones. This makes the analysis considerably 
simple and tractable because each OFDM frequency tone can 
be treated as an independent narrow-band channel. Hence, it 
suffices to characterize the interference at single OFDM tone. 

Consider the complex received interference at any given 
sub-carrier (at the BS antenna r) due to all interfering users, 
which can be represented as l39l . 

1 = (41) 

ie , t 

where, Xi=a,iexp{j9i} is the interfering symbol, 
h % =7='^ oijexp-jj fa } is the interfering channel, where 
(3 > 1 is the pathloss exponent, «, is an independent 
Rayleigh distributed random variable with fl=E{a 2 }=l 
and fa is independent random variable that is uniformly 
distributed over [0, 27r). The symbols x t are generated from 
a general bi-dimensional constellation with M equiprobable 
symbols A m =a^ m ^exp{j9 ( ' rn ^}, m= 1, 2, • • • , M. We assume 
that all interfering users transmit with the same average 
energy per symbol E x and that the transmission constellation 
is normalized so that E{|;rj| 2 }=l. Therefore, fiTl i can be 
expressed as, 

\/E x a,iaieyrp{j{Qi + fa)} _ ^ V E x Zi 

—0 - — ~r~ 

ie*\o '* ie\p\o h 

where, Zi = aiOtiexp{j(6i + fa)}. 

B. Interference characterization 

Although! can be completely characterized, to simplify our 
analysis of pilot contamination, we assume X to be Gaussian 
and thus require only the first two moments, i.e., the mean 
and the variance. They are given in the following lemma. 
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Lemma 2. Using the network model of I VI- A\ the mean and 
variance of interference X is, 

lix = E{I} = 0 (43) 

and 

cxf = E{|X| 2 } 

= 7rA(/3 — l) _1 E{|x| 2 }-B x f2 ( —2^Z2- 2 ^ 2 ') (44) 

\7 o 7 m / 

respectively. 

Proof: See Appendix [A] ■ 

Although (144b is derived by considering that the interference 
space is annular, it can be extended for an infinite interference 
space with a protection region of 7 0 by taking the limit as 
7 m -»• 00 yielding, 

<4 = l) _1 E{|ar| 2 } ■ ( 45 ) 


C. Effect of PC on MSE Performance 

The knowledge of interference statistics at single OFDM 
frequency tone, obtained through lemma [ 2 ] allows us to 
evaluate the aggregate interference correlation over all OFDM 
tones and/or across the whole BS antenna array using known 
channel statistics. Consider the received OFDM symbol at ?’th 
BS antenna, after omitting the index V, 

~y r = Ah r + X r + Wr 

= Ah r + £ r (46) 

where, X r is the interference at antenna r of BS due to 
pilot contamination and £ r is the interference term which 
captures the effect of both pilot contamination and the noise. 
Due to independence of noise and pilot contamination terms, 
each having a zero mean, the correlation matrix of £ r is 
R £r = hi, +R-ui • Now, using the interference power (or vari¬ 
ance) at each OFDM sub-carrier from lemma [2] the inter¬ 
ference correlation matrix R Xr across OFDM tones can be 
easily obtained as R Xr =cr\ AR tap A H , where we assumed that 
all user channels (both desired and interfering) have identical 
correlations (as in section [TT1» and use the same pilots, which is 
the worst case scenario from pilot contamination perspective. 
Similarly, in the multi-antenna case, based on system model of 
®, the interference correlation matrix for the whole BS array 
can be obtained as R £ =R X -|-R tu , with R x =crf ARhA H . 
Using these interference correlations, we can derive the MSE 
expressions for LS, L-LMMSE and O-LMMSE algorithms in 
the presence of noise and pilot contamination by replacing 
the noise covariance matrix R [u =<r 2 I with matrix R £r or R £ 
in the MSE expressions already obtained in section [III] The 
results are presented in following theorems. 

Theorem 1. For the system model described in sectionUHand 
pilot contamination as characterised in section [W] the MSE 
expression for LS estimation algorithm of section \III- Cl under 
both AWGN and pilot contamination is given by, 

MSE (ls) = ^ + Rn\ trace(A) , (47) 


where, <j\ is given in ( 1441 ) and A is a diagonal matrix with 
eigenvalues of R tap spread along the diagonal and all users 
are assumed to have similar channel characteristics. 


Proof: See Appendix iBl ■ 

Theorem [T] shows that MSE is composed of two terms. 
The first term due to AWGN can be suppressed by increasing 
the number of pilot tones but the second term due to pilot 
contamination cannot be reduced by adding more pilots and 
even persists at high SNR (i.e., p —> 00). 


Theorem 2 . For the system model described in sectionUHand 
pilot contamination as characterised in section [W] the MSE 
expression for L-LMMSE estimation algorithm presented in 
section \III-A\ under both AWGN and pilot contamination is 
given by, 


L 

MSE (L) = 

2=1 


Si (l + pKSiO f) 

1 + pKSi + pKSiUz ’ 


(48) 


where, a\ is given in (0, <5 i are the eigenvalues o/R iap and 
all users are assumed to have similar channel characteristics. 


Proof: Replace R„ with R„, + Rx r in MSE expression 
®. then invoking the eigenvalue decomposition (EVD) of 
R tap , follow the steps of Theorem Q] given in Appendix [B] 
We skip the detailed proof due to its similarity to Theorem Q] 


Note that (l48l > reduces to MSE expression for AWGN (given 
in ([Tol l) had there been no pilot contamination. At high SNR 
(i.e. p A> 1 ), when there essentially remains only the effect of 
pilot contamination, the MSE expression (l48l > reduces to, 

MSE ( L) highSNR R f^i_\ trace{A)j (49) 
V 1 + °z / 

which shows that MSE is independent of number of pilots and 
that LMMSE is more robust to pilot contamination compared 
to LS. 


Theorem 3. For the system model described in section 1771 and 
pilot contamination as characterised in section [W] the MSE 
expression for O-LMMSE estimation algorithm presented in 
section \III-B\ under both AWGN and pilot contamination is 
given by, 


MSE (0) = EE 

i=1 i=l 


p.jSi (l + pKpjSi<Jx) 

1 + pKpjSi + pKpjSiOx 


(50) 


where, <j\ is given in ( 1441 ). pj and Si are the eigenvalues of 
R array an d R* ap respectively, and all users are assumed to 
have similar channel characteristics. 


Proof: See Appendix ICl ■ 

Note that (l50l > reduces to the MSE expression for AWGN 
given in (IT3l > in absence of pilot contamination. Again observe 
that, under the assumption of high SNR, when the effect of 
pilot contamination predominates AWGN, the MSE expression 
in (l50l) simplifies to, 

MgE (0) highSNR (O f_\ trace ( RQrra J trace ( A ) (51) 

V 1 + G x J 
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Table II: Parameters for simulation 


Parameter 

Value 

Array Size (M X G) 

10 x 10 

Array element spacing d x , d y 

0.3v, 0.5v 

Number of OFDM sub-carriers ( N ) 

256 

Number of pilots ( K ) 

32 

Signal constellation modulation 

4/16/64 - QAM 

Channel length (L) 

8 


This indicates that MSE depends strongly on interference 
power and is independent of number of pilots K. Since 
trace(R array ) < R, the O-LMMSE seems to be more robust 
to pilot contamination compared to both LS and L-LMMSE. 
The MSE expression also gives us clue that effect of pilot 
contamination can be minimized by exploiting the spatial 
correlations and by optimizing the BS antenna array design. 

Above theorems quantify the effect of pilot contamina¬ 
tion on MSE performance of channel estimation in terms 
of interference power (or variance) which in turn depends 
on different parameters described in lemma U The MSE 
performance against various parameters will be numerically 
analysed through simulations. 

VII. Simulation Results 

We adopt the channel model in (|TJ with spatial correlation 
matrix given in © whose parameters are: (mean 

horizontal AoD in radians), 9=3ir/8 (mean vertical AoD in 
radians), <t=7t/ 12 (standard deviation of horizontal AoD) and 
£=7 t/ 36 (standard deviation of vertical AoD). The channel 
tap correlation matrix follows an exponentially decaying PDP, 
E{|/i r .(r)| 2 }=e -T , while rest of the parameters are given 
in the Table El where v represents the carrier frequency 
wavelength in meters. It is also assumed that receiver has the 
knowledge of channel correlations. 

To assess the performance of different algorithms we use 
the following MSE performance criterion: 

1 0 

MSE = 0 E H h ‘ - hil 2 (52) 

where, IT and If are true and estimated CIR vectors (at the 
?'th trial) respectively, each of size RL x 1 and 0 represents 
the total number of trials. We used 0=100 in our simulations. 

We conduct five different experiments to study the perfor¬ 
mance of our proposed approach and compare it with the three 
methods i.e., LS, L-LMMSE and O-LMMSE described earlier 
in Section [III] We also perform experiments to validate our 
analysis and study the impact of pilot contamination on all 
these methods. 

A. Experiment 1: How many iterations (D)? 

In this experiment we are interested in finding the number of 
iterations, required for convergence of the proposed distributed 
LMMSE algorithm. We plot the MSE of proposed D-LMMSE 


SNR—0dB 



# of iterations (D) 



(a) 


(b) 


Figure 6: Number of iterations ( D ) required to achieve the 
convergence of distributed algorithm. 


algorithm (red curve) against the parameter D (i.e., number of 
iterations) in Fig. |6(a)| The SNR was fixed at 0 dB. The MSE 
values of other algorithms, which do not depend on parameter 
D, are also shown. It can be seen that the proposed algorithm 
converges very closely to the optimal in 3 iterations. Note that, 
when the antennas do not collaborate (i.e., D= 0), the MSE 
of distributed algorithm coincides with that of L-LMMSE 
because no information sharing takes place. As the information 
from neighbors comes in during the next few iterations, the 
MSE decays exponentially until it converges to near optimal 
solution. Fig. |6(b)| also suggests that there would be hardly 
any improvement in MSE for D > 3. 

B. Experiment 2: MSE Performance in AWGN 

In this experiment, we compare the MSE performance of 
different algorithms in the presence of AWGN using the 
parameters in Table HT1 The results given in Fig. [7] show that O- 
LMMSE performs better than both LS and L-LMMSE in terms 
of MSE as it is able to utilize the antenna spatial correlations. 
As shown, the proposed D-LMMSE algorithm (Algorithm [Qi 
achieves near optimal results in just 3 iterations. The analytical 
MSE expressions given in Section [III] for LS, L-LMMSE and 
O-LMMSE under AWGN are also plotted with legends (Th.), 
which agree with simulation results. 

Fig. |8(a)| shows the MSE performance of proposed data- 
aided algorithm (DAD-LMMSE in Algorithm [2j against other 
pilot-based algorithms. It is obvious that data-aided approach 
has the best performance compared to all others and that the 
effect of using reliable carriers is more pronounced at higher 
SNR. Fig. |8(b)| demonstrates the MSE behaviour of different 
algorithms with varying number of pilots K with SNR fixed 
at 20 dB. As is shown, increasing the pilot tones yields better 
estimation performance but this comes at the cost of lower 
spectral efficiency. The data-aided algorithm however, is able 
to achieve the best performance even for a small number of 
pilot tones. 

C. Experiment 3: Mean and variance of interference 

This experiment aims to validate the mean and variance 
of the interference given in Lemma |2] In order to mimic the 
setup described in Section lVI-AI we use single antenna BS and 
assume that CIRs from each user to the BS has a uniform PDR 
Further, we assume that BS is located at the origin, the desired 
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Figure 7: MSE performance of different algorithms in white 
Gaussian noise. 




(a) (b) 

Figure 10: Effect of pilot contamination on MSE performance 
|(a)| MSE as a function of SNR for A=0.l [(b)| MSE as a function 
of A for SNR fixed at 10 dB. 


SNR = 20 dB 




Figure 8: MSE performance comparison of data-aided D- 
LMMSE algorithm with pilot-based techniques in white Gaus¬ 
sian noise. 



A 



Figure 9: Mean and variance of interference at single OFDM 
sub-carrier as a function of A. 


user at a distance of lm from BS while interfering users are 
distributed in a region of radius 5m and with a protection 
region of 7 D =2m according to a PPP with density A and 
pathloss exponent /3= 2. All users communicate with BS using 
OFDM with iV=256, L —8 and A'=32 identical pilot symbols 
drawn from a 4-QAM constellation. Fig.[9]compares the mean 
and variance of interference observed on single OFDM carrier 
(randomly picked) due to simulated sources with expressions 
given in Lemma [21 as a function of A. The results indicate a 
close match between simulation and theory. 


D. Experiment 4: MSE Performance under AWGN and Pilot 
Contamination 

In this experiment we study the MSE performance of 
different algorithms in presence of both AWGN and pilot 
contamination. For simulations, we use the parameters given 
in Table HI1 with the interfering users distributed according to a 
PPP of A=0.1 and pathloss 6=2. The desired user is assumed 
lm away from BS located at origin while the interfering 
users are distributed in circular region of radius 5m with 
protection region of 7 0 =2 m. In Fig. |10(a)| the simulated 
MSE performance of different algorithms is compared over 
a wide range of SNR with the analytical expressions given 
in Theorems [I] [2] and [3 (see Section IVl-Ct . From Fig. |10(a)] 
note that all MSE curves decrease with increasing SNR in 
lower range but reach an error floor at higher SNR. This is 
in stark contrast to AWGN case (see Fig. [3, where the MSE 
always decreases with increasing SNR. This shows that pilot 
contamination persists even at higher SNR and its effect on 
MSE is more severe than AWGN. 

We present similar analysis in Fig. |10(b)| where the MSE 
is plotted as a function of A with SNR fixed at 10 dB. It is 
obvious that all algorithms perform well for small values of 
A. However when A increases, the interference due to pilot 
contamination dominates AWGN, thus severely degrading the 
performance as indicated by a sharp increase in MSE curves. 
Note that LMMSE channel estimation is more robust to pilot 
contamination than simple LS based channel estimation. Also 
observe a close match between simulation and theoretical 
analysis, shown in Fig. [TOj over a wide range of A. 

E. Experiment 5: Computational Complexity 

In this experiment we compare the average runtime of 
various algorithms that can be regarded as a measure of 
computational complexity. Fig. [TT] shows the average runtime 
with increasing number of BS antennas under the default 
simulation parameters of Table [TT] It is clear that computa¬ 
tional requirements for proposed D-LMMSE algorithm, with 
different values of parameter D, grow at much slower pace 
than that of the O-LMMSE algorithm as the number of BS 
antenna increases. Further, in terms of memory requirements 
and communication overhead (not shown here), the advantages 
of D-LMMSE are even more tangible. 
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Figure 11: Average runtime of various algorithms. 


VIII. Conclusion 


of interference can be computed as follows, 
4 = E{|Z| 2 } 




cm r 27r rim i 

= A£*E{|^| 2 } / / -^rdrd0 

JO J 7o r 

= 7tA(/ 3 - ir^fiEllxl 2 } 

\7o 7ni / 

where, == is due to the fact that Zi are independent SS random 
variables, in = we employed Campbell’s theorem and in = we 
used the result E{|zj| 2 } = E{a 2 a 2 } = RE{|a;| 2 }, where we 
note that a,; and on are independent random variables, which 
completes the proof. 



Channel estimation is a challenging problem in massive 
MIMO systems as the conventional techniques applicable to 
MIMO systems cannot be employed owing to an exceptionally 
large number of unknown channel coefficients. We proposed 
a distributed algorithm that attains near optimal solution at 
a significantly reduced complexity by relying on coordination 
among antennas. To reduce the pilots overhead, the distributed 
LMMSE algorithm is extended using data-aided estimation 
based on reliable carriers. To gain insight into the effect 
of pilot contamination on channel estimation performance, 
we used the stochastic geometry to obtain the aggregated 
interference power and then based on this, we derived MSE 
expressions for different algorithms under AWGN and pilot 
contaminated scenarios. The derived expressions were verified 
using simulation results. Extending the obtained results to 
analyzing the system throughput under pilot contamination 
remains open for future work. 


Appendix B 
Proof of TheoremQ] 

By replacing R„, with R„. + Rx r in MSE expression of 
we obtain 

mse^ = trace ^A H (R ro + Rj r ) _1 

= trace ^A H (R™ + a\ AR iap A H ) 1 A^ 

= trace(A H R“ 1 A - a 2 A H R- 1 A(R iQ 1 p 
+ a 2 A H R- 1 A)- 1 A H R- 1 A) _1 


Appendix A 

Mean and variance of interference 


The mean of X can be determined as follows, 

\! E x Zi 1 


M x = E{X} = E^ 

lieT' T 
yfE^E z {zi} 

ki€ , T 


H 


= E * E 

l iG'S’ 

*■= \/E^E{zi} [ —ardrdd = 0 
J R2 rP 


where, '= results from Campbell’s theorem ll40l and then the 
fact, E{zi} = 0 yields the zero mean. Similarly, the variance 


where, == follows from matrix inversion lemma. Now, using 
the EVD of the channel correlation matrix R tap = QAQ H 
and the fact that A H R” 1 A = I L we obtain. 




o 2 x KE x \ 

<4 J 



-1 


(53) 


where, = follows from the property that trace (QRQ h ) = 
trace(R) if Q is unitary. After simple algebraic manipulations, 
the term inside the summation simplifies to E " E i 

which completes the proof. 
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Appendix C 
Proof of Theorem[3] 

Under both AWGN and pilot contamination, we replace R u , 
with R £ = R. u , + ctjAR hA H to get, 

MSE (0) = trace ^RU+A h ^R.^,4-crf AR h A H ^ 1 A 
= trace ^R h 1 + AR h A H - a\ AR h A H (R h 1 

+ a|AR h A H ) ^RhA 11 

where == follows from matrix inversion lemma. Using 
the properties of kronecker product, it can be shown that 
ARhA H = ^§£-(1 R (g) I l). Further, the channel correla¬ 
tion matrix Rh = R array 0 Rtap can be decomposed as 
Rh = (V ® Q)(S 0 A)(V g) Q) H , where we introduced the 
EVDs, Rarray = VSV H and Rt a p = QAQ H . Incorporating 

these results in '= yields. 




MSE (0) = tracefs^ 1 gA- 1 +^^(I K gI L )-a| ^ 


KE, 


•IS" 1 ® A 


R L 


= ££' 

3 =1 *=1 


1 


KE X _ 2 / KE, 

HjSi al CTl 


R <X> Ii 
2 


-1 


V 


f 1 


a^KE x 




where, = follows from property, trace (QRQ h ) =trace(R) 

( c ) 

when Q is unitary and = is due to the diagonal nature of the 
matrix inside the trace operator, where fij and 6, represent the 
eigenvalues of matrices R„ r ray and R fop respectively. After 

(c) 

some algebraic manipulations, = simplifies to the result given 
in Theorem [3] 
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